[SC-8983] Remove ydata-profiling library dependency by AnilSorathiya · Pull Request #333 · validmind/validmind-library

AnilSorathiya · 2025-03-09T20:20:43Z

Internal Notes for Reviewers

There are not much dependencies of the ydata-profiling library and new version showing the message which can confuse users of the vm-library.

…-library-dependency

github-actions · 2025-03-10T11:25:45Z

PR Summary

This pull request introduces several changes to the codebase:

Refactor Data Type Inference: The infer_datatypes function has been refactored and moved to validmind/utils.py. This function now includes enhanced logic for determining if a column is text-based using heuristics and pattern matching. It also provides detailed type information, including subtypes for numeric and text data.
Remove Unused Dependencies: The dependencies on ydata_profiling and related imports have been removed from the codebase. This includes the removal of ProfilingTypeSet and Settings from ydata_profiling in the DatasetDescription.py and Skewness.py files.
Test Data Update: The test data in test_DatasetDescription.py has been updated to include more realistic text examples, such as email addresses and longer text strings.
Dependency Cleanup: The poetry.lock and pyproject.toml files have been updated to remove the ydata-profiling dependency, along with other unused packages like dacite, htmlmin, imagehash, multimethod, phik, pywavelets, typeguard, visions, wordcloud, and ydata-profiling.

These changes aim to streamline the codebase by removing unnecessary dependencies and improving the robustness of data type inference.

Test Suggestions

Test the infer_datatypes function with a variety of DataFrame inputs to ensure it correctly identifies column types, including edge cases like all-null columns.
Verify that the is_text_column function accurately classifies text columns using different patterns and thresholds.
Run unit tests on the Skewness function to ensure it correctly uses the refactored infer_datatypes function.
Check that the removal of ydata_profiling and related imports does not affect the functionality of existing features.
Ensure that the updated test data in test_DatasetDescription.py produces the expected results.

johnwalz97

beautiful 😍...

my only suggestion is maybe to break the utils.py file into separate files (dataset_utils.py, model_utils.py etc) since its getting quite long

cachafla

🫡

Remove ydata-profiling library dependency

24162bb

AnilSorathiya added the internal Not to be externalized in the release notes label Mar 9, 2025

AnilSorathiya added 6 commits March 9, 2025 20:42

Merge branch 'main' into anilsorathiya/sc-8983/remove-ydata-profiling…

7c7a41a

…-library-dependency

poetry lock update

f688339

update infer_datatypes function to infer Text type

46920f8

update test

7ba18f9

remove lint error

bcda30f

remove print statements

02f0253

AnilSorathiya requested review from cachafla, johnwalz97 and juanmleng March 10, 2025 11:29

johnwalz97 approved these changes Mar 10, 2025

View reviewed changes

cachafla approved these changes Mar 11, 2025

View reviewed changes

AnilSorathiya merged commit f296019 into main Mar 11, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SC-8983] Remove ydata-profiling library dependency#333

[SC-8983] Remove ydata-profiling library dependency#333
AnilSorathiya merged 7 commits intomainfrom
anilsorathiya/sc-8983/remove-ydata-profiling-library-dependency

AnilSorathiya commented Mar 9, 2025

Uh oh!

github-actions bot commented Mar 10, 2025

Uh oh!

johnwalz97 left a comment

Uh oh!

cachafla left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AnilSorathiya commented Mar 9, 2025

Internal Notes for Reviewers

Uh oh!

github-actions bot commented Mar 10, 2025

PR Summary

Test Suggestions

Uh oh!

johnwalz97 left a comment

Choose a reason for hiding this comment

Uh oh!

cachafla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants